Algorithms for Identifying Structural Variants in Human Genomes
نویسندگان
چکیده
of “Algorithms for Identifying Structural Variants in Human Genomes” by Anna Ritz, Ph.D., Brown University, May 2013 Variation in genomes occurs in many forms, from single nucleotide changes to gains and losses of entire chromosomes. Large-scale rearrangements, called structural variants (SVs), are associated with numerous diseases and are common in cancer genomes. However, many SVs in mammalian genomes are found in highly repetitive regions, complicating their detection and characterization. Ongoing development of genomic technologies invite new algorithmic approaches to SV detection. In this thesis, we present a collection of four algorithms that identify SVs using data from current and emerging genomic technologies. The first algorithm is designed for a technology called array-comparative genomic hybridization (aCGH), which measures the number of copies of DNA segments present in a test genome relative to a known reference genome. aCGH data is useful for measuring deletions and duplications, and it is obtainable for thousands of individuals from a single population or disease group. We describe a method to identify SVs that are common to a group of individuals, and apply the method to aCGH data from cancer patients. We recover an SV in prostate cancer that is known to be biologically important, and we infer a number of novel SVs in brain cancer. Our other algorithms are designed for DNA sequencing technologies, which measure a broader range of SVs than aCGH data with the tradeoff of higher cost. One DNA sequencing technology, strobe sequencing, yields multiple sequences from a single, contiguous fragment of DNA. While strobes provide longer sequenced portions of DNA compared to other sequencing technologies, the per-base error rate is substantially higher. Our algorithms for SV detection exploit the benefits of multiply-linked DNA sequences while being robust to high sequencing error rates. We describe the first published method for SV detection using strobe sequencing, which finds the smallest number of SVs (relative to a known reference genome) that explain the strobes. We then improve upon our method with a probabilistic algorithm that better models the strobe sequencing data. Finally, we describe a de novo assembly algorithm for strobe sequencing data when a reference genome is unavailable. We assess the performance of these algorithms on simulated and real strobe sequencing data, and conclude that with appropriate algorithms, strobe sequencing compares favorably to other DNA sequencing technologies. Algorithms for Identifying Structural Variants in Human Genomes
منابع مشابه
Combinatorial Algorithms for Structural Variation Detection in High Throughput Sequenced Genomes
Recent studies show that along with single nucleotide polymorphisms and small indels, larger structural variants among human individuals are common. The Human Genome Structural Variation Project aims to identify and classify deletions, insertions, and inversions (>5 Kbp) in a small number of normal individuals with a fosmid-based paired-end sequencing approach using traditional sequencing techn...
متن کاملComputational approaches for identifying deleterious nonsynonymous variants from human genomes
Identification of the causal genomic variants that alter human phenotypes, particularly those that lead to diseases, is the central goal of human genetics studies. Genome-wide studies have identified several hundreds of common variants associated with complex human diseases and traits. Despite these successes, most of the common variants only have a small individual contribution to the estimate...
متن کاملA geometric approach for classification and comparison of structural variants
MOTIVATION Structural variants, including duplications, insertions, deletions and inversions of large blocks of DNA sequence, are an important contributor to human genome variation. Measuring structural variants in a genome sequence is typically more challenging than measuring single nucleotide changes. Current approaches for structural variant identification, including paired-end DNA sequencin...
متن کاملIdentifying structural variants using linked-read sequencing data
Motivation Structural variation, including large deletions, duplications, inversions, translocations, and other rearrangements, is common in human and cancer genomes. A number of methods have been developed to identify structural variants from Illumina short-read sequencing data. However, reliable identification of structural variants remains challenging because many variants have breakpoints i...
متن کاملEducation Chapter 6 : Structural Variation and Medical Genomics
Differences between individual human genomes, or between human and cancer genomes, range in scale from single nucleotide variants (SNVs) through intermediate and large-scale duplications, deletions, and rearrangements of genomic segments. The latter class, called structural variants (SVs), have received considerable attention in the past several years as they are a previously under appreciated ...
متن کامل